EN FR
EN FR


Section: New Results

Mastering Heterogeneous Platforms

Participants : Cedric Augonnet, Olivier Aumage, Nicolas Collin, Ludovic Courtès, Nathalie Furmento, Sylvain Henry, Andra Hugo, Raymond Namyst, Cyril Roelandt, Corentin Rossignon, Ludovic Stordeur, Samuel Thibault, Pierre-André Wacrenier.

  • We continued our work on extending StarPU to master exploitation of Heterogeneous Platforms.

  • We have released version 1.0.0 of StarPU , now really considered a stable project that a lot of collaborators can base their work on.

  • We have extended our lightweight DSM over MPI to support caching data [17] , which dramatically reduces data transfers for classical applications.

  • We have extended the StarPU scheduler to let the application provide several implementations of a function for the same architecture, implementation choice being performed by the scheduler according to actually measured performance, energy consumption, etc.

  • We have collaborated with Computer Graphics research team in the MediaGPU project to make it possible to directly graphically render results from StarPU computations.

  • Work has been initiated to integrate StarPU and SimGrid for the SONGS project, which will allow to simulate application execution on heterogeneous architectures, and thus easily experiment with scheduling strategies.

  • We have extended StarPU with a protocol that permits to make it run with a master-slave model, which allowed to easily port it to the Intel SCC and Intel Xeon Phi processors, and will allow an easy load balancing support over MPI.

  • We have extended StarPU to allow multiple parallel codes to run concurrently with minimal interference. Such parallel codes run within scheduling contexts that provide confined execution environments which can be used to partition computing resources. Scheduling contexts can be dynamically resized to optimize the allocation of computing resources among concurrently running libraries. We introduced a hypervisor that automatically expands or shrinks contexts using feedback from the runtime system (e.g. resource utilization).

    We demonstrated the relevance of our approach using benchmarks invoking multiple high performance linear algebra kernels simultaneously on top of heterogeneous multicore machines. We showed that our mechanism can dramatically improve the overall application run time (-34%), most notably by reducing the average cache miss ratio (-50%).

  • We have improved [15] the OpenCL implementation on top of StarPU (SOCL ) to allow applications to use StarPU 's scheduling contexts through OpenCL 's contexts and to explicitly schedule some kernels to enhance performance. Moreover, SOCL fully supports the OpenCL ICD extension and can now be dynamically selected amongst other available platforms which makes it easier to use.

  • We have continued collaborations on applications on top of StarPU with the University of Mons [14] , the University of Vienna [20] , the University of Linköping, the University of Tsukuba, TOTAL, the CEA INAC in Grenoble and the BRGM French public institution in Earth science applications.

  • In a joint work with French SME company CAPS entreprise, as part of the ANR ProHMPT project, we have demonstrated a proof of concept framework enabling three kinds of pieces of applicative code — a native StarPU code, a Magma/StarPU code and a HMPP/StarPU code annotated with HMPP's directives — to integrate and cooperate together on a computation as a single coherent application.

  • As part of the HPC-GA project, we initiated a preliminary study with University of Rio Grande do Sul (UFRGS), Brazil, to cooperate on the modeling of common computing kernel tasks and potentially making use of kernel models designed at UFRGS within the StarPU's task cost evaluation framework.

  • As part of the partnership with Total, and in relationship with StarPU's task scheduling work, we have explored solutions to semi-automatically adapt the grain of elementary tasks to the available computing resources.